Compensation for phonetic nuisance variability in speaker recognition using DNNs
نویسندگان
چکیده
In this paper, a new way of using phonetic DNN in textindependent speaker recognition is examined. Inspired by the Subspace GMM approach to speech recognition, we try to extract i-vectors that are invariant to the phonetic content of the utterance. We overcome the assumption of Gaussian distributed senones by combining DNN with UBM posteriors and we form a complete EM algorithm for training and extracting phonetic content compensated i-vectors. A simplified version of the model is also presented, where the phonetic content and speaker subspaces are learned in a decoupled way. Covariance adaptation is also examined, where the covariance matrices are reestimated rather than copied from the UBM. A set of primary experimental results is reported on NIST-SRE 2010, with modest improvement when fused with the standard i-vectors.
منابع مشابه
Variability compensated support vector machines applied to speaker verification
Speaker verification using SVMs has proven successful, specifically using the GSV Kernel [1] with nuisance attribute projection (NAP) [2]. Also, the recent popularity and success of joint factor analysis [3] has led to promising attempts to use speaker factors directly as SVM features [4]. NAP projection and the use of speaker factors with SVMs are methods of handling variability in SVM speaker...
متن کاملUnsupervised Compensation of Intra-Session Intra-Speaker Variability for Speaker Diarization
This paper presents a novel framework for unsupervised compensation of intra-session intra-speaker variability in the context of speaker diarization. Audio files are parameterized by sequences of GMM-supervectors representing overlapping short segments of speech. Session-dependent intra-session intra-speaker variability is estimated in an unsupervised manner, and is compensated using the nuisan...
متن کاملA comparison of session variability compensation techniques for SVM-based speaker recognition
This paper compares two of the leading techniques for session variability compensation in the context of GMM mean supervector SVM classifiers for speaker recognition: inter-session variability modelling and nuisance attribute projection. The former is incorporated in the GMM model training while the latter is employed as a modified SVM kernel. Results on both the NIST 2005 and 2006 corpora demo...
متن کاملI-Vector/PLDA Variants for Text-Dependent Speaker Recognition
The i-vector/PLDA approach currently dominates the field of text-independent speaker recognition and the question of how to translate this methodology to the text-dependent domain has recently become an active area of research. The essential difference between the two fields is that it is possible to do speaker recognition with enrollment and test utterances of very short duration in the text-d...
متن کاملWeighted Nuisance Attribute Projection
Nuisance attribute projection (NAP) has become a common method for compensation of channel effects, session variation, speaker variation, and general mismatch in speaker recognition. NAP uses an orthogonal projection to remove a nuisance subspace from a larger expansion space that contains the speaker information. Training the NAP subspace is based on optimizing pairwise distances to reduce int...
متن کامل